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Abstract 

At  the  functional  level,  all  biological  processes  in 
cells  can  be  represented  as  a series  of  biochemical 
reactions  that  are  stochastic  in  nature.  We  have 
developed  a software  package  called  Biomolecular 
Network  Simulator  (BNS)  that  uses  a stochastic  approach 
to  model  and  simulate  complex  biomolecular  reaction 
networks.  Two  simulation  algorithms — the  exact 
Gillespie  stochastic  simulation  algorithm  and  the 
approximate  adaptive  tau-leaping  algorithm — are 
implemented  for  generating  Monte  Carlo  trajectories  that 
describe  the  evolution  of  a system  of  biochemical 
reactions.  The  software  uses  a combination  of  MATLAB 
and  C-coded  functions  and  is  parallelized  with  the 
Message  Passing  Interface  (MPI)  library  to  run  on 
multiprocessor  architectures. 

1.  Introduction 

All  chemical  reactions  require  the  interaction  of  the 
reactants  with  sufficient  energy  to  overcome  the 
activation  energy  barrier  and,  fundamentally,  they  are 
stochastic  in  nature.  The  best  way  to  model  kinetics  of  a 
system  of  chemical  reactions  is  to  use  a stochastic 
approach  in  terms  of  the  Chemical  Master  Equation 
(CME),  with  the  number  of  molecules  of  each  molecular 
species  as  variables.  The  CME  describes  transitions  of 
the  system  from  one  state  to  another  state  based  on 
probabilistic  methods.  Gillespie  proposed  a method  to 
simulate  probabilistically-correct  trajectories  based  on  the 
CME  through  the  use  of  Monte  Carlo  methods1'1. 

Although  the  Gillespie  stochastic  algorithm  is  a 
method  for  exact  simulations,  as  it  explicitly  counts  each 


reaction  event  that  occurs  in  the  system,  the  accuracy  of 
the  method  comes  at  a high  computational  cost.  This  is 
especially  true  for  systems  with  a large  number  of 
molecular  species,  where  reactions  occur  numerous  times 
in  a short  period  of  time.  To  accelerate  discrete  stochastic 
simulation,  Gillespie  proposed  the  tau-leaping  method  as 
an  approximate  simulation  strategy121.  By  using  Poisson 
random  numbers,  the  tau-leaping  method  can  leap  over 
many  reactions  without  a significant  loss  of  accuracy. 

We  developed  a software  package — the  BNS— that 
can  use  the  Gillespie  stochastic  algorithm  or  the  tau- 
leaping  algorithm  to  simulate  the  behavior  of  a system  of 
biochemical  reactions.  It  allows  scientists  to  build  a 
synthetic  biomolecular  network  and  explore  its 
performance  utilizing  the  capacities  of  high  performance 
computing  (HPC).  BNS  contains  tools  for  both 
simulating  the  system  and  for  analyzing  the  results  of  the 
simulation.  Since  the  simulations  are  stochastic  in  nature, 
the  user  must  run  thousands  of  simulations  to  characterize 
the  “ensemble”  behavior  of  biological  systems.  The 
parallelized  BNS  code  allows  users  to  run  multiple 
simulations  and  store  results  on  multiprocessor  platforms. 
In  this  paper,  we  present  a brief  description  of  the  BNS 
software  along  with  some  examples. 

2.  Stochastic  Simulation  Algorithm 

Let  us  consider  a system  composed  of  N well-mixed 
chemical  species,  S,  (/  = 1,. . .,N),  in  a fixed  volume  V, 
which  are  involved  in  M reactions,  R/j=  1,...,M).  The 
dynamical  state  of  the  system  can  be  specified  by  the  state 
vector  X(t)  = (X](t),  X2(t),...,XN(t)),  where  XJt)  is  the 
number  of  molecules  of  species  S,  at  time  t. 
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The  probability  for  each  reaction  R,  is  defined  by  a 
propensity  function  aj(X)  = Cj  ■ Hj(X ),  where  Cj  is  the 
stochastic  probability  constant  and  Hj(X)  represents  the 
number  of  possible  combinations  of  reactants.  The 
probability  that  reaction  Rj  will  occur  in  the  time  interval 
dt  is  defined  as  ajfti)dt.  Each  reaction  is  also 
characterized  by  its  state-change  vector  y,  = (vtj, 
v2j,  ...,vNj),  where  Vj,  is  the  change  in  the  number  of 
molecules  S,  caused  by  one  reaction  Rr 

To  study  the  evolution  of  the  state  vector  X(t), 
Gillespie  proposed  an  algorithm  for  Monte  Carlo 
generation  of  stochastic  trajectories1'1.  The  direct 

simulation  algorithm,  which  is  implemented  in  BNS, 
answers  two  questions:  (1)  which  reaction  will  occur 

next?  and  (2)  what  is  the  waiting  time  for  the  next 
reaction  to  occur? 

To  answer  these  questions,  two  random  numbers,  r, 
and  r2,  uniformly  distributed  over  the  interval  (0,1)  are 
generated.  The  first  random  number  is  used  to  determine 
the  next  reaction  Rj , such  that 

j- 1 j 

. <rl  -a0  <YJa,.  (1) 


«0  =Xv  (2> 

7=1 

The  distribution  of  the  waiting  time  is  given  by  following 
probability  density  function 

= exp(-a0r).  (3) 

Here,  P(t,  j ) is  the  probability  that  the  waiting  time 

for  the  reaction  is  x and  that  it  will  be  an  Rj  reaction.  The 
waiting  time  for  the  next  reaction  is  calculated  as1'1 

T= — log—.  (4) 

«0  r2 

After  the  next  reaction  and  its  waiting  time  are 
determined,  the  reaction  is  executed  and  the  state  of  the 
system  is  changed  according  to  the  state-change  vector  v 
The  simulation  time  is  increased  by  x and  the  next 
simulation  step  is  generated. 

The  Gillespie  stochastic  algorithm  is  exact  for  the 
elementary  reactions,  uni-uni,  uni-bi,  and  bi-uni,  with  any 
number  of  molecules  in  the  system.  For  a system  with 
large  numbers  of  molecules,  the  trajectories  generated  by 
stochastic  Monte  Carlo  simulations  converge  to  the 
trajectory  generated  by  deterministic  differential 
equations1 'l 


3.  TAU-Leaping  Algorithm 


The  tau-leaping  method  calculates  a time  interval  x, 
which  encompasses  more  than  one  reaction  event  and 
satisfies  the  Leap  Condition,  i.e.,  the  expected  state 
change  induced  by  the  leap  must  be  sufficiently  small  that 
no  propensity  function  changes  its  value  by  a significant 
amount.  Several  methods  have  been  proposed  recently  to 
choose  the  size  of  the  time  interval  for  tau-leaping13  51. 
We  implemented  the  tau-leaping  method  proposed  by  Cao 
et  al.161.  In  that  method,  a tau  selection  formula  is  given 
by 


r = mm  •< 

iel, 


j max{fx,./g,.,l}  max  {gar.  / g,,l}2 

|y“i(*)|  ’ ofW 


(5) 


where  g,  is  the  highest  order  of  reaction  in  which  species 
Sj  appears  as  reactant,  e is  an  error  control  parameter,  lrs  is 
the  set  of  indices  of  all  reactant  species,  and 

Mi(x)  = 'LVuajW>  (6*> 

j 

af(x)  = YJyfjcij(x).  (7) 

j 


After  a time  interval  x'  has  been  selected,  the  number 
of  firings  of  each  reaction  channel  during  this  time 
interval  is  approximated  as  a Poisson  random  variable. 
The  Poisson  random  variable  can  have  an  arbitrarly  large 
value  and  the  population  of  some  of  the  molecular  species 
is  allowed  to  temporarily  attain  negative  values.  To  avoid 
negative  populations  we  used  a modified  tau-leaping 
algorithm  proposed  in  Reference  7.  With  this  approach  a 
second  control  parameter  nc  is  introduced;  a positive 
integer  typically  in  the  range  of  5-20.  If  a number  of 
molecular  species  becomes  less  than  nc,  all  reactions  in 
which  that  species  appears  as  a reactant  are  defined  as 
critical  reactions.  These  reactions  are  simulated  by  the 
stochastic  simulation  algorithm.  We  calculate  the  sum  of 
propensity  functions  of  all  the  critical  reactions  acQ  and 
generate  a second  candidate  time  x"  according  to 

" 1 t 1 

* = — log-.  (8) 

an  r 


The  actual  time  leap  x is  selected  as  the  smaller  of  x' 
and  x".  If  t = t',  a number  of  firings  kt  is  generated  as  a 
Poisson  random  variable  with  mean  a,(x)  r for  all 
noncritical  reactions  and  the  reactions  are  executed;  no 
critical  reaction  is  executed  in  this  case.  If  x = x",  we 
generate  k,  and  fire  noncritical  reactions  as  in  the  previous 
case;  also,  another  random  number  with  uniform 
distribution  is  generated  to  find  which  critical  reaction 
needs  to  be  fired  according  to  Eq.  (1). 
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Evaluations  of  the  performance  of  the  tau-leaping 
algorithm  shows  a 2-3  fold  speedup  in  simulations 
compared  to  the  exact  stochastic  algorithm,  while 
maintaining  excellent  accuracy  with  regards  to  both  rare 
and  frequent  events. 

4.  Biotnolecular  Network  Simulator 

The  Biomolecular  Network  Simulator  uses  a 
combination  of  MATLAB  and  C-coded  modules.  The 
front-end,  graphical  user  interface  (GUI)  and  analysis 
tools  of  BNS  are  written  in  MATLAB,  while  the 
simulation  core  engine  is  written  in  C.  Such  a 
combination  allows  one  to  use  the  interactive  features  and 
visualization  tools  of  MATLAB,  while  achieving  high 
speed  for  the  computationally  intensive  part  of  the 
software.  The  parallelization  of  the  code  is  done  with  the 
help  of  a MPI  library.  The  BNS  can  be  run  on  any 
computer  platform  where  MATLAB  6.5  or  newer  is 
installed. 

4.1.  Input  Data 

A model  is  a set  of  mathematical  relationships  that 
describe  the  behavior  of  biochemical  reactions  that 
control  cellular  biological  processes.  Each  of  the  ‘model’ 
directories  contains  one  or  more  subdirectories  with 
model  description  files  and/or  different  sets  of  parameters 
for  the  same  model  and  an  ‘output’  directory  where  the 
results  of  the  simulations  are  stored.  There  are  two  types 
of  model  directories:  one  for  models  defined  in  the 
Systems  Biology  Markup  Language  (SBML)  format181 
and  one  for  models  defined  as  a set  of  MATLAB  m-files, 
referred  to  as  the  BNS  format.  In  addition,  BNS  allows 
one  to  perform  simulations  with  multiple  parameter  sets, 
with  each  parameter  set  being  run  multiple  times. 
Simulations  with  multiple  parameter  sets  can  be  used  for 
optimization  and  sensitivity  analysis  of  the  model. 

4.2.  Output  Data 

There  are  two  types  of  output  files:  snapshot  data 
and  event  log  data.  Snapshot  data  files  contain  the  state 
of  the  system  (number  of  molecules  of  each  molecular 
species)  at  user-specified  time  intervals.  The  information 
stored  in  the  snapshot  files  is  used  to  create  runtime 
interactive  graphics  and  for  post  hoc  analysis  of  the  data. 
The  second  type  of  output  files,  the  event  log  files, 
contain  the  record  of  every  discrete  event  that  occurs 
during  the  simulation.  The  user  should  be  aware  that 
event  log  files  may  require  considerable  memory  or  hard 
disk  space  and,  therefore,  may  create  memory 
management  problems  for  simulations  involving  a large 
number  of  long  runs  or  for  large  reaction  networks. 


4.3.  Parallelization 

The  parallelization  of  the  BNS  code  is  accomplished 
using  the  MPI  library.  In  our  parallelization  scheme,  the 
‘master’  processor  divides  the  total  number  of  user- 
specified  runs  among  the  available  processors,  sends  a set 
of  jobs  to  each  of  the  ‘workers’,  and  performs  some  of  the 
simulation  runs  itself.  In  this  approach  to  parallelization, 
sometimes  called  “embarrassingly  parallel”,  we  reduce 
the  communication  among  nodes  and  increase  the 
speedup  of  multiple  simulations.  To  test  BNS  scaling 
with  the  number  of  processors,  we  ran  1 ,000  simulations 
on  an  SGI  Origin  3900  machine  at  the  Aeronautical 
Systems  Center  Major  Shared  Resource  Center,  which  has 
a shared  memory  architecture.  A simulation  is  defined  as 
a single  run  of  the  mathematical  model  for  a particular 
biochemical  reaction  network.  A 92-fold  speedup  was 
observed  when  running  BNS  on  100  processors 
(Figure  1). 


4.4.  Running  the  Simulations 

The  BNS  can  be  run  either  in  command  line  mode  or 
via  a GUI.  The  GUI  allows  the  user  to  modify  model 
parameters  at  runtime  and  to  execute  simulations  in  the 
interactive  or  batch  mode  on  HPC  resources. 

The  main  dialog  window  of  the  BNS  GUI  is  shown 
in  Figure  2.  It  allows  the  user  to  select  the  appropriate 
‘Model’  and  ‘Parameters’  directories  and  set  the  ‘Run’ 
mode.  A click  on  the  ‘Details’  button  next  to  the 
‘Parameters’  directory  opens  a new  window,  shown  in 
Figure  2(b).  This  dialog  window  allows  the  user  to 
modify  model  parameters  and  to  set  parameters  for  the 
simulation. 

If  simulations  are  run  in  an  interactive  mode,  partial 
results  of  the  simulations  will  appear  on  the  screen  during 
the  run.  Usually,  HPC  centers  allocate  limited  resources 
(in  number  of  processors  and  running  time)  for  interactive 
simulations.  Therefore,  in  addition  to  running  in  an 
interactive  mode,  BNS  can  also  run  in  ‘batch’  mode.  In 
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this  mode,  all  output  data  are  stored  on  the  hard  drive  for 
further  analysis. 


Figure  2.  The  screen  shots  of  the  BNS  GUI  dialog  windows, 
(a)  The  main  BNS  dialog  window,  (b)  The  parameters  dialog 
window,  which  allows  users  to  modify  the  model  parameters 
and  to  set  simulation  parameters. 


containing  transcription,  translation  and  metabolic 
reactions.  The  simulations  were  carried  out  using  two 
different  algorithms  implemented  in  BNS:  Gillespie 
stochastic  algorithm  and  tau-leaping  algorithm  with 
parameters  e = 0.1  and  nc  = 5.  Figure  3 shows  the 
between-run  average  of  the  time-weighted  average 
number  of  molecules  for  200  runs  using  a time-averaging 
interval  of  50  sec.  The  average  number  of  molecules  of 
species  SI  changed  in  the  range  of  -0-20  molecules, 
while  the  average  number  of  molecules  of  species  S2 
changed  -0-2,500  molecules.  Results  obtained  using  an 
approximate  tau-leaping  algorithm  are  almost 
indistinguishable  from  the  results  of  the  exact  Gillespie 
algorithm  for  both  types  of  species.  On  the  other  hand, 
using  the  tau-leaping  algorithm  speeds  up  simulations 
more  than  3-fold  compared  to  the  exact  stochastic 
algorithm. 


4.5.  Analysis 

BNS  has  a comprehensive  set  of  tools  for  post- 
simulation analysis.  A GUI  for  the  analysis  tools  allows 
the  user  to  easily  select  the  data  and  to  set  conditions  for 
the  analysis.  Multiple  types  of  post-simulation  analyses 
are  available. 

4.5.1.  Plots  of  Number  of  Molecules  vs.  Time 

The  most  frequently  used  type  of  analysis  is  a plot  of 
the  number  of  molecules  vs.  time.  Such  plots  are 
available  in  the  interactive  mode  or  as  post-simulation 
analysis.  There  are  two  ways  to  create  plots:  each 
compound  is  plotted  on  a separate  figure  or  multiple 
compounds  are  plotted  on  the  same  figure  window 
(grouping  mode).  The  number  of  molecules  vs.  time  plots 
can  be  created  with  both  types  of  output  files:  snapshot 
data  or  event  log  data. 

4.5.2.  Time-Weighted  Average  Analysis 

A time-weighted  average  analysis  refers  to  the 
calculations  of  the  average  number  of  molecules  of  a 
particular  molecular  species  during  a user-selected  time- 
averaging interval.  The  average  is  weighted  according  to 
the  amount  of  time  the  compound  exists  in  each  state 
during  the  selected  time-averaging  interval.  The  time- 
weighted  average  is  then  plotted  versus  time.  The 
averaging  analysis  can  be  performed  for  a single  run  or 
for  a selected  set  of  runs.  When  the  analysis  is  applied  to 
multiple  runs,  the  plot  shows  the  between  run  average  (the 
average  of  each  individual  time-weighted  average)  and 
standard  deviation. 

The  graphs  in  Figure  3 show  the  behavior  of  two 
molecular  species,  SI  and  S2,  over  the  time  interval  of 
2,000  seconds  for  a biomolecular  reaction  network, 


Figure  3.  The  averaged  number  of  compounds  SI  and  S2  in 
the  simulation  interval  (0,  2,000)  obtained  using  the  exact 
Gillespie  stochastic  algorithm  and  approximate  tau-leaping 
algorithm  (e=0.1  , nc  =5).  The  time-weighted  average  was 
calculated  using  a 50-sec  time-averaging  interval  and  the 
individual  time-weighted  averages  were  averaged  over  the 
200  simulation  runs.  Data  for  the  mean  ± SD  are  shown. 

4.5.3.  Reaction  Frequency  Analysis 

Complex  biomolecular  reaction  networks  usually 
contain  reactions  that  occur  on  different  time  scales: 
some  reactions  have  a low  propensity  and  occur  rarely, 
while  other  reactions  are  very  fast  and  occur  frequently. 
The  data  stored  in  the  event  log  files  allow  the  user  to 
perform  various  reaction-frequency  analyses  of  the 
simulation  data  to  learn  more  about  the  basic  nature  of  the 
system.  One  type  of  analysis  creates  plots  of  the  total 
number  of  times  each  reaction  in  the  network  occurs 
during  the  simulation. 

A second  type  of  analysis  is  the  average  and  standard 
deviation  of  the  time-averaged  reaction  frequency  in  each 
reaction  channel.  Figure  4 shows  an  example  of  the  time- 
averaged  reaction  frequency  for  two  reaction  channels 
averaged  over  the  200  runs  obtained  by  running 
simulations  with  the  two  algorithms.  The  reaction  rxn21 
belongs  to  the  group  of  “slow”  reactions  with  the 
frequency  of  occurrences  in  the  range  of  0-20  firings  per 
second.  The  reaction  rxn24  is  a “fast”  reaction  and 
reached  500-600  occurrences  per  second.  As  in  the  case 
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of  average  number  of  molecules,  the  tau-leaping 
algorithm  shows  an  excellent  agreement  with  the  results 
from  exact  stochastic  simulations  for  this  particular 
biomolecular  reaction  network. 


Figure  4.  The  averaged  number  of  reaction  occurrences  per 
time-averaging  interval  for  the  reaction  channels  rxn21  and 
rxn24  obtained  using  the  exact  Gillespie  stochastic  algorithm 
and  approximate  tau-leaping  algorithm  (e=  0.1,  nc=  5).  The 
time-averaged  rate  was  calculated  using  a 5-sec  time- 
averaging interval  and  the  individual  time-averaged  rates 
were  averaged  over  the  200  simulation  runs.  Data  for  the 
mean  ± SD  are  shown. 

5.  Conclusions 

The  BNS  allows  the  users  to  simulate  the  behavior  of 
complex  biological  processes  utilizing  the  capabilities  of 
HPC  systems.  Some  of  the  features  that  distinguish  BNS 
from  similar  tools  include: 

• usage  of  MATLAB  and  C-coded  functions 

allows  the  user  to  combine  intensive 

visualization  of  data  with  high  speed 

computations; 

• parallelized  code  for  multiple  simultaneous 
simulations  allows  the  user  to  run  BNS  on 
multiprocessor  machines; 

• options  to  run  the  code  in  the  interactive  or  batch 
mode; 

• user  friendly  graphical  user  interface  allows  the 
user  to  easily  set  and  modify  parameters  of  the 
model,  simulation  and  analysis;  and 

• comprehensive  tool  sets  provide  for  post- 
simulation analysis  of  results. 
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